NLP on Financial Statements

10-Ks

Criteria Meet Specification

Get Documents

The function get_documents extracts the documents from the text.

Get Document Types

The function get_document_type returns the document type lowercased.

Preprocess the Data

Criteria Meet Specification

Lemmatize

The function lemmatize_words lemmatizes verbs.

Analysis on 10ks

Criteria Meet Specification

Bag of Words

The function get_bag_of_words generates a bag of words from documents.

Jaccard Similarity

The function get_jaccard_similarity calculates the jaccard similarities for neighboring documents.

TFIDF

The function tfidf generate TFIDF vectors for each document.

Cosine Similarity

The function get_cosine_similarity calculates the cosine similarities for each neighboring TFIDF vector/document.